Phishing websites continue to successfully exploit user vulnerabilities in household and enterprise settings. Existing anti-phishing tools lack the accuracy and generalizability needed to protect Internet users and organizations from the myriad of attacks encountered daily. Consequently, users often disregard these tools' warnings. In this study, using a design science approach, we propose a novel method for detecting phishing websites. By adopting a genre theoretic perspective, the proposed genre tree kernel method utilizes fraud cues that are associated with differences in purpose between legitimate and phishing websites, manifested through genre composition and design structure, resulting in enhanced anti-phishing capabilities. To evaluate the genre tree kernel method, a series of experiments were conducted on a testbed encompassing thousands of legitimate and phishing websites. The results revealed that the proposed method provided significantly better detection capabilities than state-of-the-art anti-phishing methods. An additional experiment demonstrated the effectiveness of the genre tree kernel technique in user settings; users utilizing the method were able to better identify and avoid phishing websites, and were consequently less likely to transact with them. Given the extensive monetary and social ramifications associated with phishing, the results have important implications for future anti-phishing strategies. More broadly, the results underscore the importance of considering intention/purpose as a critical dimension for automated credibility assessment: focusing not only on the ÒwhatÓ but rather on operationalizing the ÒwhyÓ into salient detection cues. > >
Effective search support is an important tool for helping individuals deal with the problem of information overload. This is particularly true in the field of nanotechnology, where information from patents, grants, and research papers is growing rapidly. Guided by cognitive fit and cognitive load theories, we develop an advanced Web-based system, Nano Mapper, to support users' search and analysis of nanotechnology developments. We perform controlled experiments to evaluate the functions of Nano Mapper. We examine users' search effectiveness, efficiency, and evaluations of system usefulness, ease of use, and satisfaction. Our results demonstrate that Nano Mapper enables more effective and efficient searching, and users consider it to be more useful and easier to use than the benchmark systems. Users are also more satisfied with Nano Mapper and have higher intention to use it in the future. User evaluations of the analysis functions are equally positive.
Business intelligence and analytics (BI&A) has emerged as an important area of study for both practitioners and researchers, reflecting the magnitude and impact of data-related problems to be solved in contemporary business organizations. This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A. BI&A 1.0, BI&A 2.0, and BI&A 3.0 are defined and described in terms of their key characteristics and capabilities. Current research in BI&A is analyzed and challenges and opportunities associated with BI&A research and education are identified. We also report a bibliometric study of critical BI&A publications, researchers, and research topics based on more than a decade of related academic and industry publications. Finally, the six articles that comprise this special issue are introduced and characterized in terms of the proposed BI&A research framework.
Increasing global connectivity makes emerging infectious diseases (EID) more threatening than ever before. Various information systems (IS) projects have been undertaken to enhance public health capacity for detecting EID in a timely manner and disseminating important public health information to concerned parties. While those initiatives seemed to offer promising solutions, public health researchers and practitioners raised concerns about their overall effectiveness. In this paper, we argue that the concerns about current public health IS projects are partially rooted in the lack of a comprehensive framework that captures the complexity of EID management to inform and evaluate the development of public health IS. We leverage loose coupling to analyze news coverage and contact tracing data from 479 patients associated with the severe acute respiratory syndrome (SARS) outbreak in Taiwan. From this analysis, we develop a framework for outbreak management. Our proposed framework identifies two types of causal circles—coupling and decoupling circles—between the central public health administration and the local capacity for detecting unusual patient cases. These two circles are triggered by important information-centric activities in public health practices and can have significant influence on the effectiveness of EID management. We derive seven design guidelines from the framework and our analysis of the SARS outbreak in Taiwan to inform the development of public health IS. We leverage the guidelines to evaluate current public health initiatives. By doing so, we identify limitations of existing public health IS, highlight the direction future development should consider, and discuss implications for research and public health policy.
Fake websites have become increasingly pervasive, generating billions of dollars in fraudulent revenue at the expense of unsuspecting Internet users. The design and appearance of these websites makes it difficult for users to manually identify them as fake. Automated detection systems have emerged as a mechanism for combating fake websites, however most are fairly simplistic in terms of their fraud cues and detection methods employed. Consequently, existing systems are susceptible to the myriad of obfuscation tactics used by fraudsters, resulting in highly ineffective fake website detection performance. In light of these deficiencies, we propose the development of a new class of fake website detection systems that are based on statistical learning theory (SLT). Using a design science approach, a prototype system was developed to demonstrate the potential utility of this class of systems. We conducted a series of experiments, comparing the proposed system against several existing fake website detection systems on a test bed encompassing 900 websites. The results indicate that systems grounded in SLT can more accurately detect various categories of fake websites by utilizing richer sets of fraud cues in combination with problem-specific knowledge. Given the hefty cost exacted by fake websites, the results have important implications for e-commerce and online security.
Knowledge management is essential to modern organizations. Due to the information overload problem, managers are facing critical challenges in utilizing the data in organizations. Although several automated tools have been applied, previous applications often deem knowledge items independent and use solely contents, which may limit their analysis abilities. This study focuses on the process of knowledge evolution and proposes to incorporate this perspective into knowledge management tasks. Using a patent classification task as an example, we represent knowledge evolution processes with patent citations and introduce a labeled citation graph kernel to classify patents under a kernel-based machine learning framework. In the experimental study, our proposed approach shows more than 30 percent improvement in classification accuracy compared to traditional content-based methods. The approach can potentially affect the existing patent management procedures. Moreover, this research lends strong support to considering knowledge evolution processes in other knowledge management tasks.
Online reputation systems are intended to facilitate the propagation of word of mouth as a credibility scoring mechanism for improved trust in electronic marketplaces. However, they experience two problems attributable to anonymity abuse--easy identity changes and reputation manipulation. In this study, we propose the use of stylometric analysis to help identify online traders based on the writing style traces inherent in their posted feedback comments. We incorporated a rich stylistic feature set and developed the Writeprint technique for detection of anonymous trader identities. The technique and extended feature set were evaluated on a test bed encompassing thousands of feedback comments posted by 200 eBay traders. Experiments conducted to assess the scalability (number of traders) and robustness (against intentional obfuscation) of the proposed approach found it to significantly outperform benchmark stylometric techniques. The results indicate that the proposed method may help militate against easy identity changes and reputation manipulation in electronic markets.
Content analysis of computer-mediated communication (CMC) is important for evaluating the effectiveness of electronic communication in various organizational settings. CMC text analysis relies on systems capable of providing suitable navigation and knowledge discovery functionalities. However, existing CMC systems focus on structural features, with little support for features derived from message text. This deficiency is attributable to the informational richness and representational complexities associated with CMC text. In order to address this shortcoming, we propose a design framework for CMC text analysis systems. Grounded in systemic functional linguistic theory, the proposed framework advocates the development of systems capable of representing the rich array of information types inherent in CMC text. It also provides guidelines regarding the choice of features, feature selection, and visualization techniques that CMC text analysis systems should employ. The CyberGate system was developed as an instantiation of the design framework. CyberGate incorporates a rich feature set and complementary feature selection and visualization methods, including the writeprints and ink blots techniques. An application example was used to illustrate the system's ability to discern important patterns in CMC text. Furthermore, results from numerous experiments conducted in comparison with benchmark methods confirmed the viability of CyberGate's features and techniques. The results revealed that the CyberGate system and its underlying design framework can dramatically improve CMC text analysis capabilities over those provided by existing systems.
Information overload often hinders knowledge discovery on the Web. Existing tools lack analysis and visualization capabilities. Search engine displays often overwhelm users with irrelevant information. This research proposes a visual framework for knowledge discovery on the Web. The framework incorporates Web mining, clustering, and visualization techniques to support effective exploration of knowledge. Two new browsing methods were developed and applied to the business intelligence domain: Web community uses a genetic algorithm to organize Web sites into a tree format; knowledge map uses a multidimensional scaling algorithm to place Web sites as points on a screen. Experimental results show that knowledge map out-performed Kartoo, a commercial search engine with graphical display, in terms of effectiveness and efficiency. Web community was found to be more effective, efficient, and usable than result list. Our visual framework thus helps to alleviate information overload on the Web and offers practical implications for search engine developers.
The volume of qualitative data (QD) available via the Internet is growing at an increasing pace and firms are anxious to extract and understand users' thought processes, wants and needs, attitudes, and purchase intentions contained therein. An information systems (IS) methodology to meaningfully analyze this vast resource of QD could provide useful information, knowledge, or wisdom firms could use for a number of purposes including new product development and quality improvement, target marketing, accurate 'user-focused' profiling, and future sales prediction. In this paper, we present an IS methodology for analysis of Internet-based QD consisting of three steps: elicitation, reduction through IS-facilitated selection, coding, and clustering; and visualization to provide at-a-glance understanding. Outcomes include information (relationships), knowledge (patterns), and wisdom (principles) explained through visualizations and drill-down capabilities. First we present the generic methodology and then discuss an example employing it to analyze free-form comments from potential consumers who viewed soon-to-be-released film trailers provided that illustrates how the methodology and tools can provide rich and meaningful affective, cognitive, contextual, and evaluative information, knowledge, and wisdom. The example revealed that qualitative data analysis (QDA) accurately reflected film popularity. A finding is that QDA also provided a predictive measure of relative magnitude of film popularity between the most popular film and the least popular one, based on actual first week box office sales. The methodology and tools used in this preliminary study illustrate that value can be derived from analysis of Internet-based QD and suggest that further research in this area is warranted.
The Kohonen Self-Organizing Map (SOM) is an unsupervised learning technique for summarizing high-dimensional data so that similar inputs are, in general, mapped close to one another. When applied to textual data, SOM has been shown to be able to group together related concepts in a data collection and to present major topics within the collection with larger regions. This article presents research in which the authors sought to validate these properties of SOM, called the Proximity and Size Hypotheses, through a user evaluation study. Building upon their previous research in automatic concept generation and classification, they demonstrated that the Kohonen SOM was able to perform concept clustering effectively, based on its concept precision and recall 7 scores as judged by human experts. They also demonstrated a positive relationship between the size of an SOM region and the number of documents contained in the region. They believe this research has established the Kohonen SOM algorithm as an intuitively appealing and promising neural-network-based textual classification technique for addressing part of the longstanding "information overload" problem.
Information retrieval using probabilistic techniques has attracted significant attention on the part of researchers in information and computer science over the past few decades. In the 1980s, knowledge-based techniques also have made an impressive contribution to "intelligent" information retrieval and indexing. More recently, information science researchers have turned to other, newer artificial intelligence-based inductive learning techniques including neural networks, symbolic learning, and genetic algorithms. The newer techniques have provided great opportunities for researchers to experiment with diverse paradigms for effective information processing and retrieval. In this article we first provide an overview of newer techniques and their usage in information science research. We then present in detail the algorithms we adopted for a hybrid Genetic Algorithms and Neural Nets based system, called GANNET. GANNET performed concept (keyword) optimization for user-selected documents during information retrieval using the genetic algorithms. It then used the optimized concepts to perform concept exploration in a large network of related concepts through the Hopfield net parallel relaxation procedure. Based on a test collection of about 3,000 articles from DIALOG and an automatically created thesaurus, and using Jaccard's score as a performance measure, our experiment showed that GANNET improved the Jaccard's scores by about 50 percent and it helped identify the underlying concepts (keywords) that best describe the user-selected documents.